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FINAL  REPORT 

Award  No.:  AFOSR  F  49620-93-1-0109 
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Period:  01^1/93  to  06/30/96 

Princi'pal  hivestujator:  Christoph  von  der  Malsburg 
Computer  ScAence  Dept.,  Univ.  of  Soiithem  California,  Los  Angeles 


Objectives 


The  following  objectives  were  pursnecl  in  the  project: 


•  Object  recognition  from  digital  camera  images. 

•  Invariance  with  respect  to  position. 

•  Robustness  with  respect  to  distortion  (rotation  in  depth,  object  defor¬ 
mation). 

•  Robustness  with  respect  to  illumination,  noise,  partial  occlusion  and 
changing  background . 

•  Preparation  for  massi^■eh'  parallel  implementation. 

Our  system 


Our  system  is  fully  described  in  an  article  that  is  accepted  for  publication 
[3].  See  also  the  web  book  [2]. 

Our  sj'stem  for  ol^ject  recognition  and  scene  anal}'sis  is  based  in  an  essential 
way  on  the  Morlet  wavelet  transformation.  This  is  a  series  of  convolutions, 
performed  on  input  images.  As  convolution  kernels,  Morlet  wavelets  (also 
called  Gabor-type  wavelets)  are  used.  These  form  a  resolution  pyramid.  A 
single  image  point  is  then  characterized  by  the  vector  of  responses  of  all 
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kernels  ecnterecl  at  that  point.  This  vector  is  called  a  jet.  A  full  array  of  jets, 
one  for  each  pixel,  forms  an  “image  domain.” 

Oljjects  to  be  recognized  arc  stored  as  two-dimensional  arrays  of  such  jets 
which  arc  extracted  from  the  aspect  of  a  sample  image  of  the  olrject.  Different 
objects  (and  if  necessary  different  aspects  of  one  ol)ject)  are  stored  side  by 
side  in  a  “model  domain.” 

An  oljject  appearing  in  the  image  domain  is  recognized  by  graph  matching. 
This  is  implemented  with  the  help  of  a  full  matrix  of  connections  ( “dynamic 
links”)  between  image  points  and  all  nodes  in  all  objects.  (Although  this 
full  matrix  is  concei^tually  simple,  a  sparse  matrix  is  used  in  actual  applica¬ 
tions.)  This  matrix  is  reorganized  as  a  dynamical  system  (see  Table  1  in  [3]). 
Nodes  in  a  model  and  nodes  (pixels)  in  the  image  are  compared  in  terms 
of  their  jets.  ,Jet  similarity  is  computed  as  a  scalar  product  of  normalized 
jets.  Matrix  reorganization  is  attempting  to  install  one-to-one  connections 
between  nodes  in  image  and  model,  winning  connections  tend  to  have  maxi¬ 
mal  jet  similarit}-,  and  neighl^oring  nodes  tend  to  be  connected  to  neigboring 
nodes.  Thus,  a  (distorted)  topologiccil  connection  pattern  is  .set  up  between 
image  and  model(s).  Different  models  compete  for  connections,  and  the  best¬ 
fitting  model  wins  out  in  a  winner-takc-all  scheme.  All  these  tendencies  are 
implemented  in  a  simple  set  of  dynamical  ecpiations. 

To  handle  different  a.spects  of  the  same  object  (the  oljject  seen  under  different 
angle),  and  to  handle  ^'ariations  in  one  olrject  type,  multiple  models  are 
stored  (“l)unch  graph,”  not  part  of  this  project).  Several  such  models  for  one 
object  can  be  compacted  economically  into  a  “fusion  graph.”  This  aspect  of 
our  system  has  l)een  implemented  and  perfected  in  other  projects.  Complex 
scenes,  and  scenes  with  hcav}'  object  occlusion,  necessitate  a  separate  process 
of  figure- ground  separation.  This  can  be  homogeneously  integrated  into  the 
system  developed  here  (and  it  was  part  of  the  original  project),  but  due 
to  severe  budget  cuts  in  the  third  year  of  this  project  that  function  was 
not  implemented  here  (although  we  have  demonstrated  the  figure-ground 
separation  process  in  another  project). 

In  separate  work  we  have  reduced  the  chuiamic  link  matrix  to  a  low-dimensional 
entity  that  can  Ire  efficiently  and  quickly  optimized  on  a  digital  machine.  We 
have  been  able  to  achieve  real-time  olrject  recognition  from  video  input  (not 
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part  of  this  project). 


To  a  large  part,  the  s5^stein  devcloi)c(l  in  this  project  has  l^een  shaped  by  con¬ 
siderations  for  massively  parallel  implementation.  In  one  extreme  case,  the 
wavelet  transformation  can  be  implemented  by  a  fully  optical  system  that  is 
pixel- parallel  and  wavelet-sequential.  .Jet  similarity  can  then  be  implemented 
as  a  temporal  correlation  between  local  signals.  With  appropriate  photore- 
fractive  materials  the  dynamic  link  matrix  can  be  efficiently  implemented. 
Further  development  is  needed  to  realize  this  potential.  In  a  less  advanced 
system,  electronic  analog  signal  processing  architectures  or  DSP  arrays  can 
be  used  to  compute  the  wavelet  transform  in  real  time  at  high  frame  rates 
and  to  process  jet  similarities  at  extremely  high  rates.  The  company  Siemens 
is  developing  a  vision  chip  (SEE-1)  that  is  specifically  targeting  our  appli¬ 
cation.  That  chip  will  lie  centered  around  an  array  of  128  DSPs,  will  have 
a  sustained  computing  power  of  5  GFLOP,  and  will  enable  us  to  analyze 
complex  video  sequences  in  real  time. 


Accomplishments/New  Findings 


We  haA’e  applied  our  .SA'stem  to  the  recognition  of  objects  (human  faces) 
against  large  data  bases  (more  than  100  persons)  with  only  one  image  per 
person.  We  have  achieved  correct  recognition  with  high  reliability  (in  more 
than  80%  of  the  ca,ses  the  correct  gallery  face  Avas  ranked  as  number  one)  in 
spite  of  rotation  in  depth  of  np  to  15  degrees  and  changes  in  facial  expression 
(see  [3]).  With  depth-rotfition  of  30°,  the  recognition  rate  fell  to  66%  correct 
identification,  a  rate  still  comparable  to  that  of  human  subjects. 

We  have  been  able  to  decisiA'ely  improve  the  performance  of  our  system  with 
the  help  of  several  important  changes  to  its  dynamics: 


•  Conspiracy  of  corresponding  parts  of  stored  images  —  see  second  term 
on  the  right-hand-side  of  the  /?.-equation  of  Table  1  in  [3]. 

•  Replacement  of  additiAc  signal  combination  b}'  maximal  signal  (5th 
term  h-equation.  Table  1  of  [3]). 
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•  Introduction  of  cin  “attention” -vaiial)le  to  manage  the  region-of-interest 
in  the  image  —  a-equation,  Table  1  in  [3]. 

We  have  recently  made  the  important  discovery  that  object  recognition  in 
our  system  can  be  speeded  up  by  a  very  large  factor  by  forcing  decision  be¬ 
tween  the  various  stored  models  without  re-organization  of  the  Image-Model 
matrix.  According  to  our  experiments,  the  correct  recognition  reliability  is 
then  almost  as  high  as  with  full  matrix  re-organization. 

Our  system  has  several  features  that  let  it  stand  out  compared  to  other  object 
recognition  systems: 


•  No  learning  or  training  is  required  (although  system  performance  can 
be  improved  over  the  present  level  by  training). 

•  Flexibility/generality  with  respect  to  object  types:  no  manual  program¬ 
ming  or  construction  of  object  models  is  required.  The  system  is  based 
on  the  simple  storage  of  images. 

•  Potential  for  object  recognition  from  any  angle  or  for  better  generaliza¬ 
tion  over  object  variation  by  storing  (and  consolidating)  more  views. 

•  In  relation  to  conventional  optical  correlation  methods:  ability  to  han¬ 
dle  distortion,  partial  occlusion  and  changing  background. 

•  In  distinction  to  other  image  understanding  s3^stems,  ours  is  very  homo¬ 
geneous,  the  computing  intensive  parts  having  the  form  of  convolutions 
and  scalar  products  of  vectors.  Our  .s\’stem  thus  supports  easy  and  eco¬ 
nomical  implementation  in  parallel  hfirdware. 


Due  to  its  flexibility,  otir  sj'stem  has  the  potential  for  important  applications 
in  the  military  and  civilian  domain.  Among  these  are: 

Militaiy  applications: 

•  Target  recognition 
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•  Battle  scene  analysis 

•  Security  access  systems 

Civilian  applications: 

•  Area  surveillance, 

•  Security  access  systems  (in  fact,  we  are  already  engaged  in  commercial 
application  of  a  related  system) 

•  Manufactuiy  (quality  control,  object  manipulation) 


Personnel 


Dr.  Michael  Lyons  (postdoc);  Laurenz  Wiskott  (Research  Assistant). 


Publications 


[1]  L.  Wiskott  and  C.  von  der  Malsburg:  Recognizing  Faces  by  Dynamic 
Link  Matching.  In;  Proceedings  of  the  International  Conference  on  Artificial 
Neural  Networks,  Paris  1995,  pp.  347-352  (refereed  contributed  paper). 

[2]  L.  Wiskott  and  C.  von  der  Malsburg:  Face  Recognition  by  Dynamic  Link 
Matching.  In:  Lateral  Interactions  in  the  Cortex:  Structure  and  Function. 
Electronic  book,  Sirosh,  .J.  and  Miikkulainen,  R..  and  Y.  Choe  (editors),  chap¬ 
ter  4,  1995.  (ISBN  0-9647060-0-8)  http://www.cs.utexas.edu/users/nn/web- 
pubs/htmlbook96/ . 

[3]  L.  Wiskott  and  C.  von  der  Malsburg:  Face  Recognition  by  Dynamic  Link 
Matching.  Neural  Computation.  (Accepted  for  publication). 


Interactions/Transitions 
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Consultative  and  advisory  functions 
None 

Transitions 

None 

New  discoveries,  inventions,  or  patent  disclosures 
None 

Honors/ A  wards 

Pioneer  Award  1994  of  the  Neural  Network  Councel  of  the  IEEE. 
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